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Abstract 

Data storage plays on essential role in todays fcst-gro^ng f**2*"J« 
services. New standards and products emerge very rapidly to networked data storages. 
Given the manrrelntctnethifras^^ 

recently is using IP for storage networking because of economy and convenience. tSCSl 
is one of the most recent standards that allows SCSI protocol* to be earned out averJP 
networks. However, there ate many disparities between SCSI and IP In imns of speeds, 
bandwidrhs, data unit size, and design considerations that prevent fist and ci&aent 
deployment of SAN (Storage Area Network) over IP. This p^<ntao^snu> 
(SCSt-To-IP Cache Storage), a novel storage architecture that conptes reliable and high- 
speed data caching with low-overhead conversion between SCSI and IP protocols. A 
STICS block consists of one or several storage devices such as disks or RAID, and an 
inteffigaa processing unit with CPU and RAM. The storage devices arc nsed ^cache 
and store data -while Ate intelligent processing orrit carries out caching alggri flmpflt owl 
conversion, and self-management functions. Through efficient caching alguri flun a nd 
ipcaBzarim of certain nroecesmy protocol mreAeadi STICS can agmficantly mrprove 
pttfccma&CC teMnlftfc managejfcilHy, and scalability over current iSCSI syrfems* 
FuithemiQre, STICS cm be used as a baric phig-and-pl^r building block for date storage 
over IP- Analogous to ~cadu> memory invented several decades ago for bridging me 
S»ce4 gap between CPU and memory, STICS is die first-ever "cache stora&P to 
bridging the gap between SCSI and IP making it possible to build efficient SAN over IP. 
We have carried out a partial raplemermmon and simulation cameriments to stndy tte 
performance potential of STICS. Ntaerical results using pcrouIarPoslMark bencbmaK 
program and EMCs trace have shown dramatio performance gain over tfee iSCSI 
nrmlemenfatrnn. 

Keywords: Cache, Disk I/O, Networked Storage, NAS, SAN, iSCSI 
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1. Introduction 



As we enter a new era of computing, data storage ttt$ changed its iolc fiom "secondary* 
with respect to CPU and RAM to primary importance in today's information world 
Online data storage doubles every 9 months |2] due to ever growing demand fox 
networked information services [12, 24]. While the importance of data storage is a well- 
known fid; published Hterature is limited in the computer axMectePe research 
commnnity reporting networked storage arcMectores. We believe this situation should 
and win change very quickly ad information lias surpassed raw cornrratan'onal power as 
the important commodity. As stated in [4], "In our increasingly intenKt-dcpcndcnt 
bu stoess and conrputing environment, network storage is the computer* 

In general, networked storage architectures have evolved from network-attached storage 
(NAS) [4> & 161, storage area network (SAN) [11, 17, Id], to most recent storage over IP 
(iSCSI) [fc 9 9 21]. NAS architecture allows a storage system/device to be directly 
connected to n standard network, typically via Ethernet Clients in the network can access 
the NAS directly. A NAS based storage subsystem has built-in file system to provide 
clients with file system fimctionality. SAN technology, on the other hand, provides a 
simple block level nitcsErce for manipulating nonvolatile magnetic media. Typically, a 
SAN consists of networked storage devices uiteitonnecffd through a dedicated RhfS 
Channel network. The basic premise of a SAN is to replace the current ^pomt-to-jKJinr 
infrastructure of server to storage communications with one that allows "any-to-any" 
commmrf cations, A $AN provides high connectivity, scalability, and availability using a 
specialized network h te A ce: Fibre Channel network* Deploying such a specialized 
network usually introduces additional cost for implrrH miration, maintenance, and 
management. ISCSI is the most recent emerging technology with the goal of 
implementing the SAN technology over the bcttcr-onderstood and mature network 
infrastructure; the Internet (ICMPJl 

Implementing SAN over IP brings economy and convenience wfictcas ft also raises issues 
such as performance and reliability. Currently, there an basically two existing 
approaches: one cm^psulares SCSI protocol in TCP/IP at host bus adapter (HBA) ' level 
[21] and the other carries out SCSI and IP protocol conversion at a specialized switch 
[17], Bolh approach have severe performance Ihrritations. To cticansaiaic SCSI protocol 
over IP rernnres significant- amount of overhead traffic for SCSI commands transfers and 
hmidshaldng over the Internet Converlmg protocols at a switch places special rnrrden to 
an alreadVovedoaded switch and creates ^pother Specialized networking equipment in a 
SAN Fnrrherroore, the Internet was not designed tor transferring storage data blocks. 
Many features such as MTU (Maximtim Transfer Unit), data gram fragmerjtnrion, routing, 
and co ng e sti on control may become obstacle for providing enough instant bandwidth tor 
large block transfers of storage data. 

This paper introduces a new storage architecture called STICS (SCSI-To-lP Cache 
Storage) aimed to solve the above-mentioned issues facing storage designees to 
implement SAN over the Internet. A typical STICS bock consists of one or several 
Storage devices such as disks or RAID and an intelligent processing imfr with an 



embedded processor ami sufficient RAM ft has two standard interfaces: one is SCSI 
interface *md the other is standard Ethernet interface. Besides die regular data storage in 
SIICS, one storage device is used as a rKfflVOlatfle cache that caches data coming from 
possibly two <8redtions: block data from SCSI interface end network data from Ethernet 
fote&ca la addition to standard SCSI and IP protocols naming on the irrtetligent 
processing unit, a local file system also resides in the processing unit. The file system is 
not a standard fito system but a simplified I^g^rraetured file system pfl] that wptg* data 
very quickly and provides ^cdfll advantages to cache dam both ways. Besides caching 
Storage data in both direction*, STICS also localizes SCSI commands and handshaking 
or«3ticHW to reduce nnn« 

filter to discards a fraction of the data that would otherwise move across the Internet, 
xedncnig the bottleneck imposed by limited Internet bandwidth and increasing stoma 
data rata Apparent advantages of the STICS are: 

• It pro vides aniSCSI network cache to smooth out the traffic and improve overall 
ixrfonnanca Such a cache or bridge is not only hdpfhl but also necessary to 
gj certamdcgrrAbccan^ different nature of SCSI and IP such as speed, data 

p urn* Protocol^ and i^ir^^ a speed disparity; cache 

j£j helrM.Ana1c£austo m c^^ 
jU aa ^tticfej^^ used to cache ne^^ 

%j • Because SUCS uses log disk to cache data, it is a nwnrofaflto carte, which is 

y, extremely injporbmt foi'eachlng storage d3ta reliably since once data is written to 

. o storage, it is catradtred to be safe. 

Q * Aj*°"8fa bo ^f ^ , STICS devices and Ethernet interfaces, 

m STICS is a perfect complement to NAS. NAS allows direct connection to an 

M BUremet to be accessed by networked computers, while STiCS allows direct 

Ifl coruiechon to a SCSI interface Of & computer that m turn can access a SAN 

pi ■ nnpiem railed over the Internet. 

M * ^localizmg^part of SCSI protocol and filtering out some unnecessary traffic, 

sncs can reduce the bandwidth requirement of the Ihternetto implement SAN. 

* f*™L ,H * B are becoming feasible and popular. STICS represents 

another specific and practical imrJOTartarion of active disfcs. 

" JL^«« S ^ rd P^S-^piay bnDding block for SAN over the Internet. If 
^?^ fesfc ^^^ & ^^ ft «™^^becon S laer e dr fi a 
slanted -beam- „ ^posf that, provides rnterconneet and support of a 
construction (tinrvirfcd thai the STICS is "bfe» and "strong" enfiogh^ 

Oroia^ STES adds a new dimension to the nerwated stoma Architectures. To 
quanbtoflveljr evarnate the perfbrmBncc potential of STICS in jeal world network 
~T°^ e ^JSS?* im P ]eiaentcd A« STICS under die Linux OS. While all 
^^T^T^ 8 * 1 ™ m ^P 1 ^^ over an Ethernet switch, cache algorithm* 
fc^^aresnmrfated because Of time limit We have medTosrMark 

to SSZ^?^^ t0 ^ inre ^^Pafo^w- PostMark wails show 
mat provides 53% to 78% performance rrnproveraect over iSCSI urrplernHrtation 
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in terms of average system throughput. An order of magnitude performance gam is 
observed for 90% of I/O requests under the EMC's trace in terms of response time. 

The paper is argaraaed a* ftUaws. Next section presents the architecture and overall 
design of SUCS, Section 3 presents our initial experiments and performance evaluations. 
We discuss related research work m Section 4 arm conclude onr paper mSecdon 5. 

2. Architecture and Design of STTCS 

Figure 1 shows a typical SAN implementation over IP using STIES, Any number of 
storage devices or server computers can be connected to the standard Internet through 
STICS to form the SAN. Instead of using * specialized network or specialized switch, 
STICS connects a regular host server or a storage device to die Standard IP network 
Consider STICS I in the diagram. It is directly connected to the SCSI HBA of Host 1 as a 
local storage device, ft also acts as a cache mid bridgo to aDowHc^l to at Wock 
ievel, mry storage devkc cormc^ 

mort»toa3Wasmocm^date SCSI 
protocol sendee, caching service, naming service, and IP protocol service. 




SCSI 


HostM 
or 




Storage 



■ v-~ a ouw corneas to ma Host via SCSI interface and 

connects to otter STICS' or NAS via Internet, ^uuerjaceam 

TTKbaHcstmcnm:ofSTICSissh^ 

0 V^!^"? SITCS supports SCSI oammmdMtfOM with hosts and other 
niadec nutate mode or target mode (25], When a STICS few* * cow*« toThS, 
possibly torougt network, and sending back results to the W In tHs case, the 



STICS am as a directly attached staple device to file host. When a STICS is wed to 
connect to a storage device such as a disk or RAID to extend storage, it runs In 
Initiator mode, and it fiends or forwards SCSI requests to the extended storage device; 
For example in Figure 1, STICS 1 runs in target mode while STICS 2 tuns in initiator 
mode. 

2) An Ethernet mtcrftcc V?a die netwoik interface, a STICS can be connected to the 
Internet and fihare storage with other STICS 9 $ or network Attached storages (NAS> 

3) An mteUIegfltprocessmg unit; This p io cc ss du g unit has an embedded processor and a 
RAM. A specialized Log^&octured file system, standard SCSI protocols, and IP 
protocols run on the processing unit The RAM is mainly used as buffer cacha A 
small NVRAM (1-4MB) is also used to maintain die meta data such as bash table, 
LRU list, and the mapping information (STICS JdAPy These meta data ate stored in 
this NVRAM before wining to disks. The use of the NVRAM avoids frequently 
writing and leading meta data to/from disks. Afternativery, we can also use Soft 
Updates [3] tedmj que to keep meta data consistency without using NVRAM, 

4) A log disk: The log disk Is a sequential accessed devi ce. It is used to cache data along 
widi me RAM above in the processing laiit The JogoT^aiidaelUUAfbimatwo* 
level hierarchical cache similar to DCD [7»8]. 

5) Storage device: The regular storage device can he a disk, a RAID, or MOD (Just- 
Bnnch-Of-Disb;). Ufa storage device forms the basic storage component in a 
networlced storage system. From point of view of a server host to which the STICS is 
canceled thrcnigh 

disk. From die pohn of view of the IP network Through the network {ntetfece, tins 
i storage can he considered as a component of a networked gtOTBfK system such as a 
SAN with an IP address as its ID. 




ffeure 2: S1ICS architecture. A STICS Mock consists of a log disk, storage device, an 
teleprocessing unit with on embedded processor and sufficient RAM. Each STICS 
block has one SCSI interface and one network interface. 




2J STICS naming service 

To allow a true "any-to-any" communication between semis and storage devices, a 
global naming is necessary. In our design, each STICS is named by a global location 
number (GLN) whkh is unique &r each STICS. Currently we assign an IP address to 
each STICS and use mis IP as the GtN. 



23 Cache Strata* of STICS 
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The cachn organization in STICS consists of two level hJcnucbics: a RAM cache <md a 
log <fisfc Frequently accessed data reside in the RAM that a iggmmd as LRU cache as 
shown in Fi^rc 3. Whenever the newly written data in the RAM arc snfiuacntly large or 
*/heacver to log ^ is fica, data are written into the log disk There arc also less 
frequently accessed data kept in tne log disk. Data in the log disk ana organized in the 
*~ ~* of se &"ent3 Similar tottatina Log^tmctured Fflc Syrian P2J. A fieanent 
%Z number of slots each of which can hold one data block. Data blocks hi 
^~ die ~^ " 



W | OW 1 » 1 ) ggj | pn» [ q»« } <tetW „ { 
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L» j kba| em I 



B«ia.8iwr |^aim_in»i | pfw j mi j glatHo 



Free tht art uadtaorgan&eOiesloSentriBS. 



Th* hash Ubk,lBH liit and 



£?.5I 25* lDCate ^ ^ 0« cache, a daa bufler which cotSsc^nd 

5^£l 1^ j™^ fc** ^Wocks stored m ftc BAM cache are 

<Lfi4*,>. The Hash Table contains location 
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information for each of the valid data Mods in the cache and uses LBAs of incoming 
requests as search keys. Tho slot she is set to be the size of a block. A Slot entry consists 
of Ac following fields: 

• An LBA entry that is the LBA of the cache line and serves as the search key of hash 
tu b te j 

• Global Location Number (GIN) if the slot contains data from or to other SHCS . 

• A log disk LBA is divided into 2 parts: 

1) A state tag (2 bits), used to specify where the slot data is: JNRAM BUFFER, 
INJJQGJMSK, IMJ)ATAJ>BKorlNLOTBER_SnCS; 

2) A log disk block index (30 bits), used to specify the log disk block number if the 
state tag indicates INJLOG JDBK- The size of each log disk can be up to 2 30 
blocks, 

• Two pointers <hash_prev and hash_ncjrt) ore used to link the hash table; 

• Two pointers (ptmr and next) are used to link the LRU list and FREE fist; 

• A Slot-No is used to describe the hvmcmory location of the cached fot*. 



SCSI Initiator 



Hie system 



Generic SCSI 



FxontJEnd 
Initiator Driver 



SCSlTa^a 



Generic SCSI 
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Driver 



SCSI Responses/Data 



SCSI Commands/Data 



Figure 4: SCSI initiator and target sub-systems. 



13 SHCS modes 

As we mentioned above, A STICS may ran under two modes: Mtiamr mode or target 
inoda SCT initiate sad tar^ 

«tt<Kic,aSnCSkcomic*Wto^ Otherwise 
a SHCS funs in initiator mode, Mttamr mode is the de&nk mode of SCSL All server 
host platforms including Linux support SCSI initiator mode. We use the standard SCSI 



MSmr mode in our STICS. The SCSI target runs in parallel to the initiator and is 
concerned only with the processing of SCSI commands. We define a set of target APIs 
tor STICSL These APIs include SCSI fimttions su* as SCSIDETBCT, 
SCSTRELEASE, SCSIJREAD, SCSMVRTTE and etc When running under target mode* 
a STO looks like a standard SCSI device to a cwmectedlmsr. 

jfauj c Op CCT fioBB 

For each STICS, wc define a variable STIC$_LOAD to represent its current load. The 
higher the SUCSJLOAD, the busier the STICS is. When a STICS system starts, Its 
STICS JjOAD is "sec to zero. When the STICS accepts a request, STJCSJLGAD is 
Incremented and when a request finishes, STlCSJ-OAD Is decremented. Besides 
STICS JX3AD, wc define a STICS MAP to map all STICS loads within the network. 
SYJCSJAAP is a set of <GLN 9 STICSJXJ^ pairs. The ST^ 
dynamicalrjr* 

2.4.L Write 

Write requests may come torn one of two sources: the host via SCSI interface and 
another STICS via. the Ethernet interface. The operations of these two types of writes are 
as follows. 

Write rearrests from (he host via SCSI interface: After receiving a write request, the 
SUCS fust searches the Hash Tabic by the IBA address. If an entry brbnn^ the entry is 
overwritten by the incoming write. Otherwise, a tree slot entry is aUocatedftomtheFree 
list, the data are copied into the corresponding slot, and its address is recorded m the 
Hash table. The LRU list and Free list are then updated. When enough data slofe (16 in 
our preliminary implementation) are accumulated or when the log disk is idle, the data 
slots are written into log disk sequentially in one large writ* Afier the log write 
completes successfully, STICS signals the host that the request is complete, 

Write rcqaests from a nother STICS via Ethernet interface: A packet coming fiom tha 
network interface may turns out to be a write operation from a remote STICS on the 
network. After ieeervrng snrii a write r«y iftaf njtfm unpacVjng fly» yigtnmrir pfgftct, ypr<t 
gets a data block with GLN and LBA* It then searches the Hash Table by the ££A and 
GLNL If od entry is found* the entry is overwritten by the incoming write. Otherwise, a 
foe slot entry is allocated from the Free Ust, and the data are then copied into die 
corresrwmfm^ 
are updated accordingly. 

1A& Read 

Similar to write Operations, read operations may also coma either from the host via SCSI 
interface or from another STICS via the Ethernet interface. 
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Read requests from the bost via SCSI Interface: After waiving a read request, the 
OTCS searches the Hash Table by the ISA to determine the location of the tote. Data 
requested may be in axeof fcnr different places: the RAM buffer, the log dfckfc), me 
storage device m the focal STTCS, or a storage device hi another STICS on the nerworic. 
If the data is found is the RAM buffer, the data axe copied fiota the RAM buffer to die 
requesting buffet The SUCS then signal* Ac host that the request la complete. If the 
data Is found the log disk or the local stoiage device, dM data are read fiom me log 
disk or storage devjeointo the requesting buffia: Ctfiarwise, tbeSnCS OK^solates the 
inquest includmg LB A, enrrem GLN, and destination C3N Into an IP packet and 
forward it to the corresponding SUCS. 

Read requests from another SUCS via Eracrn er interface: When a re*d request h 
found after unpack^ 

the packet ft then searches the Hash Table by the 1^ and ^ iorirce CEJ^ to detemiine 
me location of the data. It locates and reads data from that location, finally, it Bends the 
data back to the source STfCS through the network 

Z4J Destages 

the operation of moving; data from a higher-level smrage device to a lower level stora#> 
device is defined as destage operation. There are two levels of destage operations in 
SUCS: destaging data from the RAM buffer to the log disk (Level 1 destage) and 
destaging data from log disk to a storage device [Level 2 desfoge). We implement a 
Separate kernel thread, LogDestage. to perform the destB^ng tastet The LogDestage 
thread Is rcgjustcitd during system initialization andmojiitois tfce<ynC$ states. The thread 
keeps sleep almost of the time, and is activated when one of the following events occurs: 
l)fo:DU2ttberof^c&mtheRA^ 

3) the STTCS detects an idle period, 4) the SUCS RAM buffer and/or die log disk 
becomes fulL Level J destage las higfter priority than Level 2 destage. Once die Level 1 
destage starts, it continues until a log of data in the RAM buffer is written to the log disk 
Level 2 destage may be interrupted if a new request comes in or until the log disk 
becomes empty. If the destage process is Interrupte d, the destage thread would bo 
suspended until the $TICS detects another idle period. 

As f« level 1 destage, the data in the RAM buffer are written to tim logo 
*n large size (64KB). The log disk header and the corresponding h>memory slot entries 
ere updated. All data are written to tin; log disk in "append" mode, which ensures mat 
every time the data are written to consecutive log disk blocks. 

For Level 2 destage, we use a "last-wnte-flrst-de^age" algorithm according to the LRU 
list At this point, we chose a STTCS with lowest STTCSLQAD to accept data. Each 
time 64KB data are road -from the consecutive blocks of the tog disk and written to the 
chosen $TCCS storage disks. The LRU list and free list arc updated subsequently- 



3. Performance Evaluations 
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3AMcflUHtotogy 

Our wrarimemal settings for to purpose of evatoattog the f 1 * 1 ^,!,^^! 

areTho™ in Fionres 5 and 6. Three PCs are ^"g^"^ 
tout. Cod and S/uii For iSCS, the Trout saves as toe host ^^5^°***™^ 
^astownto Figure 5. For STICS t^^™^^,"^?^ 
Sst awl the target as aho™ in F.gare 6. STICS sioiulalor ran* * «J« 
data and iSCSI comnntmoflons. All these machines arc mtcwHBtctcd tol«^ a 
100Mbps switch to Aim an isolated LAN. Bach iiacbme * T™!^Sw2 
with a 3c905 TJC 100Mbps network interface card (NIC) and an A<k^391MWgb 
performance SCSI adaptor. The wnfignratians mesewacMnes are described til Tame 
l and fie characteristics of individual disks ere summarized in Table 2. 




figure J; 1SCS/ configuration. The tost Rout establishes Mtmeettos 
us target and Acs target Squid rapogds afid connects. Hum lhe Squid 
exports herd drive and Trout sees the rftsfa as locaL 

Far iSCSI implementation, we compiled and run the LntaC ISCSI developed by Intel 
Corporation [9]. He iSCSI is compiled under Linux kernel 2.4.2 and configured as 
shown in Figure 5. There are 4 steps fcr fee two machines to est&bBsh commnntarions 
via iSCSI First, the host establishes connection to target; second, mo target respond and 
connects; third, the target machine exports its disks and finally tha Host sees These disks 
aslocaL All these steps are finished through EOctet commurttarrions. After these steps, 
the iSCSI is to "fitU feature phase" mode where SCSI commands and data can be 
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exchanged between the host and the target For each SCSI «P=f^^™* ^ 
T^^lmmumadwa as follows: 1) The hot encapsulate* tl» SCSI command into 
S5SS(PX»tO and sods this PDU to the target; 2) The target «mvcsmd 
£^ feKO. iten encapsulates a xesoonseinto ^^T^^ ^ 
lwsfc3> the host receives and decapod ttttesp^mV.U^aafa^mOio 
dam into a *DU and sends it to the tar©* if th« target is ready to ^br; 4) £J£ 
receives the data FDU and sends another response » the host to acknowledge the finish 
uf the SCSI operation. 
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Figure 6; STICS setup. The SHCS simulator caches datafitm bolh SCSI and network 

We simulated exff STTCS ijsfag aPC rmiinDgliniixkcmBl 1A2 with target mode Enpport 
The SHCS sbmlator is bmlt on top of DCD driver developed by us [7]. The SCSI target 
nwdefbniiiimfebcmwcd 

SCSI target package (18]- We use 4 MB of the system RAM to uimnbtt SHCS 
NVRAM bufiSa^ and the IogtEsk isasT?inffalmTeliard<Mvc« Ahashtafal^alJttJlls^ai^ 
mapping m&rmatian (STKSMAP) ere mamiamed in fits NVRAM. The STICS 
simulator cm run under two modes. To the host (3h»<^ it iuna as target mode and to the 
target storage (Sqmfy, it runs as initiator iqpdn. When SCSI requests come from the host, 
the simulator first processes tho requests locally. For write requests, the simulator writes 
thcdatataiisRAMbuffcr. Wheamebg^hidltortlwNVRAMa^to^wm 
bedestegcdtotfaelo£diskthro^ 

STICS sigoals host write complete. When tho log disk Is fell or the system is idfe toe 
data in log disk wffl be drataged to the lower level stofi^. At this point to STICSwffl 
decide to store data locally or to the remote disks according to SUCSMAP. Jn OW 
simulation wo store the data to local swage and remote disks equally likely. The hash 
table and LRU Est which reside in the NVRAM are updated When a read request com es 



U 



m,!lK SnCS searches to 

tan RAM. log disk, local storage, ot remote disks via network. 
3J| 3encHinirtt program and workload characteristics 

It is mrpcrtant to use realise workloads K> drive our ^annlawr fcr a fin- pefimnaiice 
ewduattott and comparison. For this wasofl, we chose to use ical world trace and 
be 



the benchmark we used to measure system Arougrow is FostMarfc [10] which is a 
popular file system bendnttatk developed by Network Appliance, j***""* 
performance in terms of transaction rates in an i^memeral small-file envoronmcnt oy 
creating a large pool of continually changing files- "PostMark was created to annate 
heavy smflJkfile system loads wim a minimal anioimtof softwamai^configiirahone^ 
and to provide complete rtproducibffity FostMark generates anWhal pool of 
nmdom text fflesran^ in size ir^ 

bound. This file pool is of configurable size and can be located on any accessible fite 
fjl «rystem.Ofu» the JK»I hashes 

.0 transactioii consists of a pair of smaller transactions, ic. Create file or Delete fits and 

\%\ Read file or Append file. 1^ transaction type and ifc 

|4 The read and write block S320 can be timed On completion of each nm, a report is 

f 1 1 generated showing some metrics such as elapsed time, transaction rate, total number of 

I£ files created and so cu 

?. 

u in addition to PostMadc, we also used a Teal*world trace obtained from EMC Corporation. 

J Hie trace, referred to as EMC-te! trace hereafter, was collected by an EMC SymmeKte 

f =l disk array system installed at a telecomnnmicarlon conanner sSe. The trace file contains 

I*; 230370 requests, with a fixed request size of 4 blocks. Ibe trace is wr^domlnated with 

| sV a write ratio of 94-7%. Tn order for the trace to be read by our STTCS and the iSCSI 

l~ implementation, we developed a program called ReqGenerator to convert the traces to 

p Mgb4evel VO icxmestx. These requests arc then fed to our simulator and iSCS I system to 



3*3.1 Throughput 

Onr first experiment is tn use Po5tMark*to measure the I/O througjmi* in terms Of 
transactions pet second, la our tests, PostMarfc was configured in two different ways as in 
[10J, First, a small pool of 1,000 initial files and 50,000 transactions; and second a large 
pool of 20,000 bridal files and 100,000 transactions. The total sizes of accessed data are 
330MB (16"U5MB read and 16838MB write) and 740MB (303.46 MB read and 
436.18MB write) respectively. They are much larger than the system RAM (128MB). 
The block siaa change rio^ 

mode. We left all other PostMark paramctera at their dctolt setting 
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to Figure 7, we plotted two separate bar graph* correspond^ to ^^^£^° f 
ad leto» W, respectively. Each pair ofears represent the sj^ dmm^te of 

ftom driXire that OTCS shows obvious better system throughput tto the BCSl jta 

^forSZupoolffld fcrgepool cases. Tbo performs gain of STICS over jSCSI 
ranges from 53% to 78%. 



Tfrniifihfiut 
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Thioqgjhput 




J^gare 7: Pastmarkmeasurvrunts. 



3^*2 Respond time* » 

Our next experiment Is to measure and compare to response times of SUCS and i$CSI 
under EMC trace In Figure 8a, wc plotted histogram of request mmbers against response 
times, Le. X-axis i^ g ^ifr response time and Y-axis represents the- jrumber of storage 
requests finished withm a particular response rime. For example, a point (X. Y^IOOO, 
25000) means that there are 25,000 requests finished -within l(K)0 microseconds. The 
Barter (brae) part of the figure is for STICS whereas me 0^rjm(r«i)part foriSCSL To 
make it dearer, we also draw a bar graph resetting percentage of requests finished 
within a given rime as shown in Figure 8b* It is interesting to note in this figure that 
STICS docs an excellent job in smoothing out the speed disparity brsween SCSI and IP. 
With STICS, over 80% of rcqjiests axe Mslied wimin 1000 rwcrosccono^ sod nwrf Uicm 
aicrmisbxriwiihmSW For iSCSl with no STTCS, about 51% of requests 

rate over 4000 imcroseeonds, about 46% take over 2000 microseconds, and ahnoscno 
request finishes wi&in 1000 rnicroseconds^The$e measured data are very 
represent c 



While STICS improve* the SCSI performance by an order of magnitude for 90% of 



time of STICS is 3652 whereas the average response rime of SCSI is about 6757. Figure 
9 shows average response times of groups of 1000 requests each, Le. we average the 
response times of every 1000 requests as a group and show the average response times 
for aD groups, m our experiments, we noticed that 5% of requests take over 8000 
imcxoseeonds for STICS. Some requests even rate up to 120,000 microseconds. These 
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tesp<*se times **> attributed to the destaging prwew- r^^T^^V^ 
Ste / Jutagng process to corttfoue imtfl ift* ^^J™ 

the log disk. We arc still working on to optimization of 4c destage^ 4 feon ton- We 
frtfeve there is 
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Wgure P: Average response time qf S37CS and iSCSI. 



4. Related Work 

aaatag nsearcfa that is most closely related to STK3 is Network AHa^edSto^^ 
ANAS) [4^d] and Nrtwude-AttaoW Secure Disk (NASD) [51. a researco project * 
Carnegie Melton University. Both technologies jnmde dirm netv^ J* 
dimteto access through network interfces and flic system fin^onality. NAS-*a«d 
scorafie appliances range fiom tts0^srnmXOi^t^^^^^ a >^T^ TS>e ^ 
storages TacsmeralW amplo to manage NASD provides swwe inwraccs via 
crypiWaphicsmroait and divides NFS-likfl fim^orality tetw^ a ^TnanBg^^dthe 
d^XcTT^fifc manager is responsible primarily tbrvcritying crcdentkk end 
cstahUshing access tokens. Clients cm access- all devices directly and in parallelciBeo 
approved by the file manager. As discussed in the irtiodnction JW^.. S™- 8 
pS, a perfect cornnlenwntto NAS. SITCS pnwides a direct ,^«™=««~» 
server host to allow toe server to access at block level a ^jnw**^™?™ 
Internet. In addition to bring a Storage component of toe SAN, a STICS pcttonns 
nrftv^rtr flM^flM for a smooth end efficient SAN hnplemcritatinn over IP nrtwo*. 
InaIocd8reaMtwnik(lAN),tieLAN^tonit^^ 
devices can be the same LAN S3 the one connecting servers accessing stt ^^» 01 

- -» m n mm «*a ■ «T ■ ■ Bum ■■■lillflil 111 >llllll III lfll mi¥f% Stiff 

beat 



Aoom»iag»it^wo& [13, 23], a research propel of 

Qran«Hi y 5 Systems Research Center. Petal uses a collection of NAS-Hto storage wvra 

leveL A wwd dlsku globally accessible to all Petal clients wifhfo the nctwoiK. Petal 
w&9 built using a IAN protocol but logically designed a SAN intorfeca Each tfwafie 
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server fceeps a global state, which makes management in a Petal system especially easy. 
iSCSI (Internet SCSI) [6,9,21] emerged very recently provides an Med alternative to 
Petal's custom LAN-based SAN protocol. Taking advantage of existing Interact 
protocols and media, ft is a nature way for storage to make use of TCP/IP as 
demonstrated by catfier research work of Meter et al of USC, tfffi< [ 15] to transfer SCSI 
commands and data using IP protocol. 

JSCS protocol is a mapping of the SCSI remote procedure invocation model over the 
TCP/IP protocol [21], The connection between the initiator and the target is established 
when a session stmts. The SC$r commands and data arc transferred between the initiator 
and target tot need to be synchronized remotely. ST1CS architecture attempts to localize 
some of SCSI protocol traffic by accepting SCSI commands and data from the host. 
Filtered data Mode is sent to Aa storage target using Internet This SCSHn-BIock-out 
mechanism provides an immediate and transparent solution both to die host and the 
storage eliminating some unnecessary remote syiichranizafloa Furthermore, STICS 
provides a nonvolatile cache exclusively for SCSI commands and data that are supposed 
to be transferred through the net^oik. This cache wiU 
of viev as wfl as avoid many onnecJess 
data are frequently ov&wiIUen. 



5. Condosions and Future Work 

In this paper, wc have introduced a new concept ^SCSl-Tv-IP cache storage? (STICS) to 
bridge the disparities between SCSI and IP in order to feciiitatc implementation of SAN 
over die Internee STICS adds a new dimension to networked storage architectures 
allowing aiiy server host to cfEdentl^ 

interlace. Using a nxmvolahie "cache storagfT, STICS smoothes out the storage data 
traffic between SCSI and IP very much like the way "cache memory 9 smoothes out 
CPU-memory traffic We have carried out a partial Implementation and simulation of 
SHCS under Ihe Linux operating system, While the arching algorithms and system 
operations msjde a STICS are simulated using simuiatois, SCSI protocols, dam transfers, 
and iSCSI protocols are actually implemented using standard SCSI HSA. Ethernet 
controller cards, and an Ethernet switch. We measured the perfbnnancc of STICS as 
compared to a typical iSCSI implementation using a popular benchmark (PosfMark) and 
a real world VO workload (EMC*s trace). PostMark results have shown that SUCS 
outperforms iSCSI by 53%-78% in terms of average system throughput Numerical 
results under EMCa trace show m order of magnitude performance gain fin- 90% of 
storage requests hi terms of response time. Furthermore, STICS is a phig-and-play 
building block for storage networks. 

We are currently in the process of completely building the SUCS box using the Linux 
Operating system. Besides performance and reliability, manageability* adaptivify, and 
scalability are under corcideiation [25]. 
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